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Abstract 

A general notion of algebraic conditional plausibility measures is defined. Probability 
measures, ranking functions, possibility measures, and (under the appropriate definitions) 
sets of probability measures can all be viewed as defining algebraic conditional plausibility 
measures. It is shown that the technology of Bayesian networks can be applied to algebraic 
conditional plausibility measures. 



< 
O 



(N 
> 

m 
o 
in 
o 
o 
o 

o 



x 



1. Introduction 



Pearl [1988] among others has long argued that Bayesian networks (that is, the dags with- 
out the conditional probability tables) represent important qualitative information about 
uncertainty regarding conditional dependencies and independencies. To the extent that this 
is true, Bayesian networks should make perfect sense for non-probabilistic representations 
of uncertainty. And, indeed, Bayesian networks have been used with k rankings [Spohn 



by Darwiche and Goldszmidt [1994 1 . It follows from results of Wilson [1994 that the 
technology of Bayesian networks can also be applied to possibility measures | Dubois and 
Prade 199Cfl . 

The question I address in this paper is "What properties of a representation of un- 
certainty are required in order for the technology of Bayesian networks to work?" This 



question too has been addressed in earlier work, see Darwiche 1992; Darwiche and Gins- 



berg 1992 ; Friedman and Halpern 1995 ; Wilson 1994 1, although the characterization given 
here is somewhat different. Here I represent uncertainty using plausibility measures, as in 



| Friedman and Halpern 1995 [ . To answer the question, I must examine general properties 
of conditional plausibility as well as defining a notion of plausibilistic independence. Unlike 
earlier papers, I enforce a symmetry condition in the definition of conditional independence, 
so that, for example, A is independent of B iff B is independent of A. While this property 
holds for probability, under the asymmetric definition of independence used in earlier work 
it does not necessarily hold for other formalisms. There are also subtle but important dif- 



ferences between this paper and [Friedman and Halpern 1995| in the notion of conditional 
plausibility. The definitions here are simpler but more general; particular attention is paid 
here to conditions on when the conditional plausibility must be defined. 

The major results here are a general condition, simpler than that given in [Friedman 
and Halpern 1995 ; Wilson 1994| , under which a conditional plausibility measure satisfies the 
semi-graphoid properties (which means it can be represented using a Bayesian network). 
Conditions are also given that suffice for a Bayesian network to be able to quantatively 
represent a plausibility measure; more precisely, conditions are given so that a plausibility 
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measure can be uniquely reconstructed given conditional plausibility tables for each node in 
the Bayesian network. Conditions for quantitative representation by Bayesian networks do 
not seem to have been presented in the literature for representations of uncertainty other 
than probability (for which the conditions are trivial). A minor additional condition also 
suffices to guarantee that d-separation in the network characterizes conditional indepen- 
dence. All these conditions clearly apply to k rankings and possibility measures. Perhaps 
more interestingly, they also apply to sets of probabilities under a novel representation of 
such sets as a plausibility measure. This novel representation (and the associated notion of 
conditioning) is shown to have some natural properties not shared by other representations. 

The rest of the paper is organized as follows. In Section ^, I discuss conditional plausi- 
bility measures. Section ||] introduces algebraic conditional plausibility measures, which are 
ones where there is essentially an analogue to + and x . (Putting such an algebraic structure 
on uncertainty is not new; it was also done in flDarwiche 1992| ; |Darwiche and Ginsberg 1992 ; 
Friedman and Halpern 1995; Weydert 1994].) Section ||| discusses independence and condi- 
tional independence in conditional plausibility spaces, and shows that algebraic conditional 
plausibility measures satisfy the semi-graphoid properties. Finally, in Section || Bayesian 
networks based on (algebraic) plausibility measures are considered. Combining the fact 
that algebraic plausibility measures satisfy the semi-graphoid properties with the results 
of [ Geiger, Verma, and Pearl 199C| ], it follows that d-separation in a Bayesian network G 
implies conditional independence for all algebraic plausibility measures compatible with G; 
a weak richness condition is shown to yield the converse. The paper concludes in Section |(| 
Longer proofs are relegated to the appendix. 



2. Conditional Plausibility 

2.1 Unconditional Plausibility Measures 

Before getting to conditional plausibility measures, it is perhaps best to consider uncondi- 
tional plausibliitiy measures. The basic idea behind plausibility measures is straightforward. 
A probability measure maps subsets of a set W to [0,1]. Its domain may not consist of 
all subsets of W; however, it is required to be an algebra. (Recall that an algebra T over 
W is a set of subsets of W containing W and closed under union and complementation, 
so that if U, V € J 7 , then so are U UV and U.) A plausibility measure is more general; it 
maps elements in an algebra T to some arbitrary partially ordered set. If PI is a plausibility 
measure, then we read P1(J7) as "the plausibility of set U" . If P1(C/) < P1(V), then V is at 
least as plausible as U. Because the ordering is partial, it could be that the plausibility of 
two different sets is incomparable. An agent may not be prepared to say of two sets that 
one is more likely than another or that they are equal in likelihood. 

Formally, a plausibility space is a tuple S = (W,J-,~P\), where W is a set of worlds, T 
is an algebra over W, and PI maps sets in J- to some set D of plausibility values partially 
ordered by a relation <£> (so that <d is reflexive, transitive, and anti-symmetric) that 
contains two special elements To and _l_£> such that _l_£> <o d <o To for all d € D; these 
are intended to be the analogues of 1 and for probability. As usual, the ordering is defined 
<D by taking d\ <d ^2 if d\ <D <h, and d\ / cfe. I omit the subscript D from <d, <£>, 
and J_£) whenever it is clear from context. 
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There are three requirements on plausibility measures. The first two are obvious ana- 
logues of requirements that hold for other notions of uncertainty: the whole space gets 
the maximum plausibility and the empty set gets the minimum plausibility. The third 
requirement says that a set must be at least as plausible as any of its subsets. 

Pll. P1(0) = ± D . 

P12. Pl(W) = T D . 

P13. If U C U', then P1(J7) < Pl(U'). 

(In P13, I am implicitly assuming that U, U' E T . Similar assumptions are made through- 
out.) 

All the standard representations of uncertainty in the literature can be represented as 
plausibility measures. I briefly describe some other representations of uncertainty that will 
be of relevance to this paper. 

Sets of probabilities: One common way of representing uncertainty is by a set of prob- 



ability measures. This set is often assumed to be convex (see, for example, [ Campos and 



Moral 1995| ; |Cousa, Moral, and Walley 1999| ; |Gilboa and Schmeidler 1991 ; |Levi 1985| ; |Walley 



1991 for discussion and further references), however, convex sets do not seem appropriate 
for representing independence assumptions, so I do not make this restriction here. For exam- 
ple, if a coin with an unknown probability of heads is tossed twice, and the tosses are known 
to be independent, it seems that a reasonable representation is given by the set Vq consisting 
of all measures n a , where fi a (hh) = a 2 , Li a (ht) = /x a (t/i) = a(l — a), /x a (tt) = (1 — a) 2 . Un- 
fortunately, Vq is not convex. Moreover, its convex hull includes many measures for which 
the coin tosses are not independent. It is argued in [ Cousa, Moral, and Walley 1999| that 



a set of probability measures is behaviorally equivalent to its convex hull. However, even 
if we accept this argument, it does not follow that a set and its convex hull are equivalent 
insofar as determination of independencies goes. 

There are a number of ways of viewing a set V of probability measures as a plausibility 
measure. One uses the lower probability V*, defined as V*(U) = inf{/i(C7) : fi G V}. Clearly 
V* satisfies PU-3. The corresponding upper probability P*, defined as V*(U) = sup{/x : fi € 
V} = 1 — V*(U), is also clearly a plausibility measure. 

Both T 3 * and V* give a way of comparing the likelihood of two subsets U and V of W. 
These two ways are incomparable; it is easy to find a set V of probability measures on W 
and subsets U and V of W such that V*(U) < P*{V) and V*(U) > V*(V). Rather than 
choosing between V* and V* , we can associate a different plausibility measure with V that 
captures both. Let Dp tt -p* = {(a, b) : < a < b < 1} and define (a, b) < (a',b') iff b < a'. 
This puts a partial order on D-p tt p*; clearly A-d v p * = (0,0) and Tr> P v , = (1,1). Define 
Pl-p^-p. (U) = (V*(U),V*(U)). Thus, Plp^-p* associates with a set U two numbers which 
can be thought of as defining an interval in terms of the lower and upper probability of U. 
It is easy to check that ~P\-p tt -p*(U) < Plp, ) -p*(y) if the upper probability of U is less than 
or equal to the lower probability of V. Pl-p^-p* satisfies Pll-3, so it is indeed a plausibility 
measure, but one which puts only a partial order on events. 

The trouble with V*, V* , and even Pl-p^p* is that they lose information. For example, 
it is not hard to find a set V of probability measures and subsets U, V of W such that 
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H(U) < fi{V) for all p G V and fi(U) < fi(V) for some fj, G V, but V*(U) = V*(V) and 
V*(U) = V*(V). Indeed, there exists an infinite set V of probability measures such that 
H(U) < fi(V) for all fx G V but V*(U) = V m (V) and V*(U) = V*{V). If all the probability 
measures in V agree that U is less likely than V, it seems reasonable to conclude that U is 
less likely than V. However, none of T 7 *, V*, or Pl-p will necessarily draw this conclusion. 

Fortunately, it is not hard to associate yet another plausibility measure with V that 
does not lose this important information. For technical convenience that will become clear 
later, assume that there is some index set I such that V = {/x.; : i G 7}. Thus, for example, 
if V = {fix, • • • , /i n }, then I = {1, . . . , n}. Let Dj = [0, 1]^, that is, the functions from / to 
[0, 1], with the pointwise ordering, so that / < g iff f(i) < g(i) for all i G 7.[] ^ is easy to 
check that J_d 7 is the function / : 7 — > [0, 1] such that f(i) = for all i G 7 and is the 
function g such that g(i) = 1 for all i £ I. For U C PI/, let /[/ be the function such that 
fu(i) = Hi(U) for all i G /. For example, for the set Vq of measures representing the two 
coin tosses (which is indexed by M), the set W can be taken to be {hh,ht,tt,th}. Then, 
for example, f{ h h}{alpha) = fi Q (hh) = a 2 and f{ht,tt}( a ) = I - a. 

It is easy to see that f$ = _Ld 7 and fw = Tdi- Now define Pfp(C/) = fy. Thus, 
Pip (17) < Plp(F) iff /[/(z) < /y(i) for all i G 7 iff n(U) < n(V) for all fieV. Clearly Pip 
satisfies Pll-3. Pll and P12 follow since P1 P (0) = / = ± Dj and Plp(PF) = / w = T Dl , 
while P13 follows since if U C V then /u(C/) < //(F) for all (j, & V. Pip captures all the 
information in p (unlike, say, T 7 *, which washes much of it away by taking infs). 

This way of associating a plausibility measure with a set V of probability measures 
generalizes: it provides a way of associating a single plausibility measure with any set of 
plausibility measures; I leave the straightforward details to the reader. 

Possibility measures: A possibility measure Poss on W is a function mapping subsets 
of W to [0, 1] such that Poss(PF) = 1, Poss(0) = 0, and Poss(C7) = sup tueC/ (Poss({w})), so 
that Poss(C7 U V) = max(Poss(C7), Poss(F)) | Dubois and Prade 1990 1. Clearly a possibility 



measure is a plausibility measure. 

Ranking functions: An ordinal ranking (or K-ranking or ranking function) k on W (as 
defined by [poldszmidt and Pearl 1992 1, based on ideas that go back to [{Spotm 1988| ) 



is a function mapping subsets of W to TV* = IN U {oo} such that k{W) = 0, k(0) = 
oo, and n(U) = mm w£ u(n({w})), so that k(U U V) = mm(n(U) , n(V)) . Intuitively, a 
ranking function assigns a degree of surprise to each subset of worlds in W, where means 
unsurprising and higher numbers denote greater surprise. It is easy to see that if k is a 
ranking function on W, then (W, 2 W , k) is a plausibility space, where x <_jv* V if and only 
if y < x under the usual ordering on the natural numbers. One standard view of a ranking 
function, going back to Spohn, is that a ranking of k can be associated with a probability 
of e fc , for some fixed (possibly infinitesimal) e. Note that this viewpoint justifies taking 
K (W) = 0, k(0) = oo, and k(U U V) = min(«(C/'), k(V)). 



In the conference version of this paper [Halpern 200C], Di, the range of the plausibility measure, was 
taken to be functions from V to [0, 1], not from the index set / to [0, 1]. The difference is mainly cosmetic, 
but this representation makes the range independent of V, so that the same plausibility values can be 
used for any set of probability measures indexed by I. 
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2.2 Conditional Plausibility Measures 

Since Bayesian networks make such heavy use of conditioning, my interest here is not just 
plausibility measures, but conditional plausibility measures (cpm's). Given a set W of 
worlds, a cpm maps pairs of subsets of W to some partially ordered set D. I write Pl([/| V) 
rather than P1(J7, V), in keeping with standard notation for conditioning. In the case of 
a probability measure fj,, it is standard to take fi(U\V) to be undefined in fi(V) = 0. In 
general, we must make precise what the allowable second arguments are. Thus, I take the 
domain of a cpm to have the form T x J 7 ' where, intuitively, J 7 ' consists of those sets in 
T on which it makes sense to condition. For example, if we start with an unconditional 
probability measure fx, J 7 ' might consist of all sets V such that /J.(V) > 0. (Note that J 7 ' is 
not an algebra — it is not closed under either intersection or complementation.) A Popper 
algebra over W is a set J- x J 7 ' of subsets of W x W satisfying the following properties: 

Accl. J- is an algebra over W . 
Acc2. J 7 ' is a nonempty subset of J 7 . 

Acc3. J 7 ' is closed under supersets in J 7 , in that if V G J 7 ' , V C V', and V E J 7 , then 
V € J 7 '. 

(Popper algebras are named after Karl Popper, who was the first to consider formally 



conditional probability as the basic notion | Popper 1968 1 • De Finetti [1936] also did some 



early work, apparently independently, taking conditional probabilities as primitive. Indeed, 
as Renyi [ 1964]] points out, the idea seems to go back as far as Keynes [ 192 1| ] . ) 



A conditional plausibility space (cps) is a tuple (W, J 7 , J 7 ' , PI), where J 7 x J 7 ' is a Popper 
algebra over W, PI : J 7 x J 7 ' — > D, D is a partially ordered set of plausibility values, and PI 
is a conditional plausibility measure (cpm) that satisfies the following conditions: 

CPU. Pl(0|y) = L D . 

CP12. Pl(W\V) = T D . 

CP13. If U C U', then Pl(U\V) < Pl(U'\V). 

CPU Pl(U\V) = Pl(U n V\V). 

CPU— 3 are the obvious analogues to Pll-3. CPM is a minimal property that guarantees 
that when conditioning on V, everything is relativized to V. It follows easily from CP11-4 
thatx P1(-|V) is a plausibility measure on V for each fixed V. A cps is acceptable if it 
satisfies 

Acc4. If V € T\ U € J 7 , and Pl(U\V) + ± D , then UnV eJ 7 '. 

Acceptability is a generalization of the observation that if Pr(V) ^ 0, then then conditioning 
on V should be defined. It says that if P1(J7|V) ^ !_£>, then conditioning on V fl U should 
be defined. 

This notion of cps is closely related to that defined in [ Friedman and Halpern 1995| . 



There, a conditional plausibility space is defined to be a family {W, Dy, Ply) : V Q W} of 
plausibility spaces that satisfies the following coherence condition, which relates conditioning 
on two different sets, where J 7 = 2 W and J 7 ' = 2 W — {0}: 
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CP15. If V n V G T' and U, U' G F, then Pl(f7|F n V) < V\(U'\V n V") iff P1(J7 n < 

Pi{u'nv\V). 

It is not hard to show that CP15 implies CP14. 
Lemma 2.1: CP/5 implies CPI4. 

Proof: Since clearly Pl(CnF|F) = V\{UC\VC\V\V), by CP15 it follows that Pl(l7|VnV) = 
P1(C7 n V\V n V), and hence Pl(U\V) = Pl(U n V|V). I 

CP15 does not follow from CP11-4 (indeed, as shown below, the standard notion of 
conditioning for lower probabilities satisfies CP11-4 but not CP15). A cps that satisfies 
CP15 is said to be coherent. Although I do not assume CP15 here, it in fact holds for all 
plausibility measures to which one of the main results applies (see Lemma |3.4|). 

To distinguish the definition of cps given in this paper from that given in fFriedman and 



Halpern 1995 



I call the latter an FH-cps. There is no analogue to Accl-4 in [Friedman and| 



Halpern 199E]; T is implicitly taken to be 2 W , while T' is implicitly taken to be 2 W — {0}. 
This is an inessential difference betwen the definitions. More significantly, note that in 
an FH-cps, (W, Dy,P\y) is a plausibility space for each fixed V, and thus satisfies Pll- 
3. However, requiring CP11-3 is a priori stronger than requiring Pll-3 for each separate 
plausibility space. Pll requires that P1(0|V) = -Ld f , but the elements -Ld v may be different 
for each V . By way of contrast, CPU requires that P1(_L|V) must be the same element, 
for all V. Similar remarks hold for P12. Nevertheless, as is shown below, there is a 
construction that converts an FH-cps to a coherent cps. 

I now consider some standard ways of getting a cps starting with an unconditional 
representation of uncertainty. A cpm PI extends an unconditional plausibility measure Pi' 
if Pl(f/|W) = Pl'(C). All the constructions given below result in extensions. 

Ranking functions: Given an unconditional ranking function k, there is a well-known 
way of extending it to a conditional ranking function: 



k(U\V) 



k(U HV) - k(V) if«(V)^oo, 
undefined if k(V) = oo. 



This is consistent with the view that if k(V) = k, then fi(V) = e k , since then k(U\V) = 
e n(unV)-K(v) ^ i s eaS y to check that this definition results in a coherent cps. 

Possibility measures: There are two standard ways of defining a conditional possibility 
measure from an unconditional possibility measure Poss. To distinguish them, I write 
Poss(C/|y) for the first approach and Poss(i7||V) for the second approach. According to the 
first approach, 

f Poss(V n U) if Poss(V n U) < Poss(y), 
Poss(C|V) = 1 if Poss(F n U) = Poss(V) > 0, 

I undefined if Poss(F) = 0. 

The second approach looks more like conditioning in probability: 

PnWr/lim - / Poss ( y n u )/ Poss ( v ) if P°ss(V) > 0. 
ross^||i/j-< undefined ifPoss(l/) = 0. 
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It is easy to show that both definitions result in a coherent cps. (Many other notions of 
conditioning for possibility measures can be defined; see, for example [ Fonck 1994j| . I focus 
on these two because they are the ones most-often considered in the literature.) 



Sets of probabilities: For a set V of probabilities, conditioning can be defined for all 
the representations of V as a plausibility measure. But in each case there are subtle choices 
involving when conditioning is undefined. For example, one definition of conditional lower 
probability is that V*(U\V) is va£{fi(U\V) : n(V) + 0} if fi(V) ^ for all /x G V, and is 
undefined otherwise (i.e., if fJ,(V) = for some ju E f). It is easy to check that "P* defined 
this way gives a coherent cpm, as does the corresponding definition of V* . The problem 
with this definition is that it may result in a rather small set T' for which conditioning 
is defined. For example, if for each set V ^ W, there is some measure [i G V such that 
fJ,(V) = (which can certainly happen in some nontrivial examples), then T' = {W}. As 
a consequence, the cps defined in this way is not acceptable (i.e., does not satisfy Acc4) in 
general. 

The following definition gives a lower probability which is defined on more arguments: 



It is easy to see that this definition agrees with the first one whenever the first is defined and 
results, in general, in a larger set T' . Moreover, the resulting cps is acceptable. However, 
the second definition does not satisfy CP15. For example, suppose that W = {a, b, c} and 
V = {//,//}, where fi(a) = n(b) = 0, ^(c) = 1, n'(a) = 2/3, fx'(b) = 1/3, and //(c) = 0. 
Taking V = {a,b}, U = {a}, and U' = {b}, it is easy to see that according to the second 
definition, V*(U n V\W) = V*{U' n V\W) = 0, but V m {U\V) > V*(U'\V). 

For Pl-p, there are two analogous definitions. For the first, Pl-p(L r |V r ) is defined only if 
n(V) > for all fj, G V, in which case Pl-p(U\V) is fuiVi where fu\v(i) = ^i{U\V). This 
definition gives a coherent cps, but again, in general, not one that is acceptable. In this 
paper, I focus on the following definition, which does result in an acceptable cps. 

First extend Dj by allowing functions which have value * (intuitively, * denotes unde- 
fined). More precisely, let D' T consist of all functions / from / to [0, 1]U{*} such that f(i) ^ * 
for at least one i G I. The idea is to define V\-p(U\V) = fu\v-> where fu\v(j>) = fH(U\V) 
if fJLi(V) > and * otherwise. (Note that this agrees with the previous definition, which 
applies only to the situation where fJ>(V) > for all \i G V.) There is a problem though, 
one to which I have already alluded. CPU says that f^iy must be _L for all V. Thus, it 
must be the case that f%\v x = f%\v 2 f° r an Vii ^2 Q W. But if /ij G V and V\, V-z C W are 
such that m(Vi) > and /ij(V 2 ) = 0, then /0|v x (i) = and /0|y 2 (i) = *, so f^ Vl / f^ V2 . A 
similar problem arises with CP12. 

To deal with this problem D'j must be slightly modified. Say that / G D'j is equivalent 
to J_£)* if f(i) is either or * for all i E I; similarly, / is equivalent to T d* if f(i) is 
either 1 or * for all i E I. (Since, by definition, f(i) ^ * for at least one i G /, an 
element cannot be equivalent to both To* and J-d*-) Let DJ be the same as D'j except 
that all elements equivalent to -L/^ are identified (and viewed as one element) and all 
elements equivalent to Tp* are identified. More precisely, let Dj = {-\-d* > ~^D* } U {/ € D' : 
f is not equivalent to T^* or _L_d*}. Define the ordering < on Dj by taking / < g if one of 
the following three conditions holds: 




in£{fi{U\V) : n{V) + 0} if (i(V) + for some //£?, 
undefined if fJ>(V) = for all fj, G V. 
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T 



neither / nor g is _!_£>* or T^j and for all i G /, either /(i) = g(i) = * or f(i) ^ *, 
g(i) ^ *, and f(i) < g(i). 



Now define 



r i 



pi P (c/|y) 



if /i(V) 7^ for some fi G V and 
/i(V) 7^ implies n{U\V) = for all 

if 3/i G P(ji(V) + 0) and V/x G ^ n{U\V) = 1), 

undefined if fJ,(V) = for all \i G T 7 , 
/{/|V otherwise. 



T * 



It is easy to check that this gives a coherent cps. 

Plausibility measures: The construction for Pip can be used to convert any FH-cps 
to a cps. I demonstrate the idea by showing how to construct a conditional plausibility 
measure from an unconditional plausibility measure. Given an unconditional plausibility 
space (W, PI) with range D, an FH-cps is constructed in |Friedman and Halpern 1995| 
by defining Pl(U\V) = Pl(U n V). Thus, D v = {d G D : d < Pl(V)} and T Dv = Pl(V). 
This is not a cps because CP12 is not satisfied, but it is an FH-cps, since Pll-3 is satisfied 



for each fixed V, and so is CP15. As observed in [Friedman and Halpern 1995], this is in 
fact the FH-cps extending PI that makes the minimal number of comparisons, in the sense 
that if PI' is an FH-cps extending PI and Pl(U\V) < Pl(U'\V), then Pl'(U\V) < Pl'(U'\V). 

To get a cps, let V = {(d,V) : V C W, d < Pl(V), P1(V) > ± D }- Say that (d,V) is 
equivalent to _l_£>* if d = J-d; say that (d, V) is equivalent to T d* it d = P1(V). Now let 
D* = {-\-d*, Tfl»} U {/ G D' : / is not equivalent to To* or _L_d*}. Then define d <d* d' 
for d,d' G D* iff d = ±d*, d! = T^, or there is some V C W such that d = (di,V), 
d! = (d2, V), and d\ <d ^2- Finally, for U,V £ J 7 , define 



P\(U\V) 



(Pi(unv),v) if ±d <Pi(?7nF) <pi(v), 

Tu. ifPl(i/nV) = Pl(V)>±D, 

± D * iiPi{unv) = ± D , Pi(v)>± D , 

undefined if P1(V) = J-£>. 



I leave it to the reader to check that PI is a coherent cpm. It is important that P1([/|V) is 
undefined if P1(V) = _L_d; if we tried to extend the construction to V such that P1(F) = _L_d, 
then we would have T^. = _!_£)*• This issue did not arise in | Friedman and Halpern 1995 |, 
since there were separate plausibility spaces for each choice of V. 



These constructions for extending an unconditional measure of likelihood to a cps have 
a property that will prove useful in stating some of the technical result. A cps (W, T, T 1 ' , PI) 
is standard if T' = {U : Pl(f7) ^ _L}. Note that all the constructions of cps's given above 
result in standard cps's. 
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3. Algebraic Conditional Plausibility Measures 

To be able to carry out the type of reasoning used in Bayesian networks, it does not suffice to 
just have conditional plausibility. We need to have analogues of addition and multiplication. 
More precisely, there needs to be some way of computing the plausibility of the union of 
two disjoint sets in terms of the plausibility of the individual sets and a way of computing 
Pl(U n V\V) given P\{U\V n V) and P1(V\V). 

A cps (W, IF', PI) where PI has range D is algebraic if it is acceptable and there are 
functions © : D x D — > D and © : D x D — > D such that the following properties hold: 

Algl. If U,U' are disjoint and V G T' then Pl(U U U'\V) = Pl{U\V) © Pl(U'\V). 

Alg2. If U G T, V n V G J* then Pl([/ n V|y') = Pl(t/|F n V) © P1(V|V)- 

Alg3. ® distributes over ©; more precisely, a © (61 © • • • © 6 n ) = (a © © • • • © (a © 6 n ) 
if (a, 61), . . . , (a, (a, 61 © • • • © 6 n ) G Dom P i(®) and (61, . . . , b n ), (a © 61, . . . ,a © 
6 n ) G L>om P1 (ffi), where 0om P1 (©) = {(Pl(Z7i |V), . . . , Pl(t/ n |y)) : Z7l, . . . , 17„ G T are 
pairwise disjoint and V G J^'} and Dom P1 (®) = {(Pl(U\V n y') 5 pl (^l^0) : U G 
J 7 , V H V G i 71 '}.^ (See below for discussion of Domp\((B) and Z?ompi(©); in the 
sequel, I omit the subscript PI if it is clear from context.) 

Alg4. If (a, c), (6, c) G Dom{®), a © c < 6 (g> c, and c 7^ _L, then a < b. 

I sometimes refer to the cpm PI as being algebraic as well. 

It may seem more natural to consider a stronger version of Alg4 that applies to all pairs 
inDxf, such as 

Alg4'. If a©c<6©c and c^l, then a < b. 

However, as Proposition [O] below shows, by requiring that Alg3 and Alg4 hold only for 
tuples in Dom{®) and Dom(©) rather than on all tuples in D x D, some cps's of interest 
become algebraic that would otherwise not be. Intuitively, we care about © mainly to the 
extent that Algl and Alg2 holds, and Algl and Alg2 apply only to tuples in Dom((B) and 
Z)om(©), respectively. Thus, it does not seem unreasonable that Alg4 be required to hold 
only for these tuples. 

Proposition 3.1: The constructions for extending an unconditional probability measure, 
ranking function, possibility measure (using either Poss(U\V) orPoss(U\\V)), and the plau- 
sibility measure Pip defined by a set V of probability measures to a cps result in algebraic 
cps 

Proof: It is easy to see that in each case the cps is acceptable. It is also easy to find 
appropriate notions of © and © in the case of probability measures, ranking functions, and 



2. In the conference version of this paper, _Dom(©) was taken to consist only of pairs, not tuples of arbitrary 
finite length, and distributivity was considered only for terms of the form a (g) (6© b'). The more general 
version considered here is slightly stronger. The reason is that it is possible that (a, 61 © • • • © b n ) G 
Dom(0) even though (a, bi © ■ ■ ■ © ^ Dom(®) for k < n. Note also that only left distribituvity is 
required here. 

3. Essentially the same result is proved in Friedman and Halpern 1995[ for all cases but Pip. 
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possibility measures using Poss(i7||V). For probability, clearly © and © are essentially + 
and x ; however, since the range of probability is [0, 1], a©6 must be defined as max(l, a+b), 
and Alg3 holds only for Dom(ffi) = {(a±, . . . , a&) : a\ + - ■ ■ + aj t < 1}; there is no contraint on 
Dom(x); it is [0, 1] x [0, 1]. For ranking, © and © are min and +; there are no constraints 
on Dom{mia) and Dom{+). For Poss(C/||V), © is max and <g> is x; again, there are no 
constraints on Dom(max) and Dom(x). I leave it to the reader to check that Algl-4 hold 
in all these cases. 

For Poss(U\V), © is again max and © is min. There are no constraints on Dom(max); 
however, note that (a, b) G Dom(min) iff either a < b or a = 1. For suppose that (a, b) = 
(Poss(£/|V n V'),Poss(V\V), where U G T and V n V G T' . If Poss(?7 n^ny') = 
Poss(F fl V) then a = Voss{U\V n V) = 1; otherwise, Poss(C/ nVnF')< Poss(y n V), in 
which case a = Poss(£/|F n V) = Poss(£7 nVnV) < Poss(F n F') < Poss(y|y') = b. It 
is easy to check Algl-3. While min does not satisfy Alg4' — certainly min(a, c) = min(6, c) 
does not in general imply that a = b — Alg4 does hold. For if min(a, c) < min(6, c) and 
a = 1, then clearly 6=1. Alternatively, if a < c, then min(a, c) = a and the only way that 
a < min(6, c), given that 6 < c or 6 = 1, is if a < 6. 

Finally, for Pl-p, © and © are essentially pointwise addition and multiplication. But 
there are a few subtleties. As in the case of probability, Dom((B) consists of sequences 
which sum to at most 1 for each index i. Care must also be taken in dealing with _!_£>* and 
~T d*. More precisely, Dom(®) consists of all tuples (/i, . . . , f n ) such that either 

!( a )- fj + T D},3 = l,...,n, 

1(b). if fj, fkj^-i-D* for 1 < j, k < n, then fj{i) = * iff fk(i) = *, for all % G /, and 
or 

2. there exists j such that /j = T^* and fk = -Ld* f° r k ^ j; 

Dom(®) consists of pairs (/, g) such that either one of / or g is in {±£,*,~^D* or neither 
/ nor g is in {_!_£>*, T ^* and g(i) G {0,*} iff /(i) = *. The definition of © is relatively 
straightforward. Define f ®T D * = T D * © / = T D * and / © ± D * = ± D * © / = /. If 
f,g l~l {-Ld*, T/)*} = 0, then f ® g = h, where = min(l, /(«) + (taking a + * = 

* + a = * and min(l,*) = *). In a similar spirit, define / © To* = T^* © / = / 
and / © _L D * = J_ D . © / = ± D *; if {f,g} n {J-d; 5 T/j*} = 0, then f ® g = h, where 
/i(i) = /(i) x <7(i) (taking *xa = ax* = 0ifa/* and * x * = *). It is important that 

* x = and * x * = * since otherwise Alg3 may not hold. For example, according to Alg3, 

((1/2, *, l/2)©(a, 0, 6))ffi((l/2, *, l/2))©(a, 0, 6)) = ((1/2, *, l/2)ffi(l/2, *, l/2))©(a, 0, b) = (a, 0, b) 

(since (1/2, *, l/2)ffi(l/2, *, 1/2) = T D *) and, similarly, ((1/2, *, l/2)©(a, *, 6))ffi((l/2, *, 1/2))© 
(a, *, 6)) = (a, *, 6). Since * x = and * x * = *, these equalities hold. I leave it to the 
reader to check that, with these definitions, Algl-4 hold (although note that the restrictions 
to Dom{®) and Dom(©) are required for both Alg3 and Alg4 to hold). | 

Conditional lower probability is not algebraic. For example, it is not hard to construct 
pairwise disjoint sets Ui, Vi, U2, and V2 and a set V of probability measures such that 
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V*{Ui) = V*{Vi) (and V*{Ui) = V*{Vi)) for i = 1,2, but U C/ 2 ) + V*{V X U V 2 )- That 

means there cannot be a function © in the case of lower probability. 

For later convenience, I list some simple properties of algebraic cpms'sthat show that _L 
and T act like and 1 with respect to addition and multiplication. Let Range{P\) = {d : 
Pl(U\V) = d for some (U, V) G T x T'}. 

Lemma 3.2: // (WjJ 7 , J 7 ', PI) is an algebraic cps, then dffi_L = _Lffid = d for all d G 
Range(Pl). 

Proof: Suppose that d = P\(U\V). By Algl, it follows that 

d = P\{U\V) = P\(U U %\V) = P\{U\V) © P1(0|V) = d © !_. 
A similar argument shows that d = _L © d. I 

Lemma 3.3: If (W, T , T' , PI) is an algebraic cps then, for all d E Range(Pl), 

(a) d© T = d; 

(b) if d^ _L, then T © d = d; 

(c) if d^ _L, then _L © d = _L; 

fdj «/ (d, _L) G Dom{®), then T©_L = d©_L = _L©_L = _L. 
Proof: Suppose that d = P1(17|V). By Alg2, CP12, and CP14, it follows that 

d = P\{U\V) = Pl(U n V\V) = P\{U\V) © P\{V\V) = d © T. 
Similarly, if d ^ _L, then [/flf ef (by Acc4), so 

d = p\{u\v) = Pi(u n v\v) = p\{u n v\u n v) © Pi(c/ n v|v) = t © d. 

If d ^ ±, then by Alg2, CPU, and CP14 

i = Pi(J-l^) = pi(±|c/ n v) © Pi([/|y) = j_ © d. 

Finally, if (d, _L) G Dom(©), then there exist £/, V, V such that VW G J 7 ', Pl(U\VnV) = d 
and Pl(V\V) = _L. By Alg2, P1(C7 n V|V) = P1(C/|F n V) © P1(F| V) = d © _L. By CP13, 
Pl(U n VIV) < P1(V|V) = _L, so Pl(U n V\V) = _L. Thus, d © _L = _L. Replacing U 
with V n V', the same argument shows that T © _L = _L; replacing U with 0, we get that 
_L © _L = _L. | 

I conclude this section by showing that a standard algebraic cps that satisfies one other 
minimal property must also satisfy CP15. Say that © is monotonic if d < d' and e < e' then 
d © e < d' © e'. A cpm (cps) is monotonic if © is. 

Lemma 3.4: A standard algebraic monotonic cps satisfies CP15. 
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Proof: Suppose that (W, J 7 , J 7 ', PI) is a standard algebraic cps and that V D V G J 7 '. If 
P1([/|F n V) < Pl(C/'|y n V), then it follows from Alg2 and monotonicity that 

p\(unv\V) = Pi(u\vr\V')®Pi{v\v') < p\(u'\vnv')®Pi(v\v') = p\(u'nv\v'). (1) 

For the opposite implication, suppose that P1(C7 n V\V) < P\(U' n V\V). Then, by Alg2, 

pi{u\v n V) ® p\(v\V) < p\(u'\v n V) ® pi(v|V). 

Since UnF' G J 7 ' and the cps is standard, it must be the case that Pl(VnV) 7^ _L Hence (by 
CP13), Pl(V') + 1; moreover, P\{V\V) ± J_ (otherwise P\{V n V) = P1(V|V*') ® Pl(F') = 
1). Thus, by applying Alg4' to §), it follows that Pl(U\V n F') < P1(C/'|F D F')- ■ 



4. Independence 

How can we capture formally the notion that two events are independent? Intuitively, it 
means that they have nothing to do with each other — they are totally unrelated; the oc- 
currence of one has no influence on the other. None of the representations of uncertainty 
that we have been considering can express the notion of "unrelatedness" (whatever it might 
mean) directly. The best we can do is to capture the "footprint" of independence on the 
notion. For example, in the case of probability, if U and V are unrelated, it seems reason- 
able to expect that learning U should not affect the probability of V and symmetrically, 
learning V should not affect the probability of U. "Unrelatedness" is, after all, a symmetric 
notion^] The fact that U and V are probabilistically independent (with respect to proba- 
bility measure /x) can thus be expressed as /jl(U\V) = n{U) and fj,(V\U) = n(V). There is a 
technical problem with this definition: What happens if /x(V) = 0? In that case fx(U\V) is 
undefined. Similarly, if fi(U) = then fj,(V\U) is undefined. It is conventional to say that, 
in this case, U and V are still independent. This leads to the following formal definition. 



Definition 4.1: U and V are probabilistically independent (with respect to probability mea- 
sure n)if fi(V) ^ implies n(U\V) = fi{U) and (i(U) / implies n(V\U) = fi(V). I 

This does not look like the standard definition of independence in texts, but an easy 
calculation shows that it is equivalent. 



Proposition 4.2: The following are equivalent: 

(a) n(U) / implies n(V\U) = fi(V), 

(b) »{UnV) = iA(U)n{V), 

(c) n{V) + implies n{U\V) = fi(U). 



4. Walley $m\ calls the asymmetric notion irrelevance and defines U being independent of V as U is 



irrelevant to V and V is irrelevant to U. Although my focus here is independ ence, irrelevance is an 



interesting notion in its own right; see [Cozman 199£; Cozman and Walley 199£] 
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Thus, in the case of probability, it would be equivalent to say that U and V are inde- 
pendent with respect to \i if \x{U n V) = fi(U)fi(V) or to require only that fi(U\V) = fJ<(U) 
if n(V) 7^ without requiring that /j,(V\U) = /x(V) if n(U) ^ 0. However, these equiv- 
alences do not necessarily hold for other representations of uncertainty. The definition of 
independence I have given here seems to generalize more appropriately]^] 

The definition of probabilistic conditional independence is analogous. 

Definition 4.3: U and V are probabilistically independent given V' (with respect to proba- 
bility measure fj,) if n(VnV) ^ implies n(U\VnV) = fi(U\V) and fi(UnV) / implies 

n(v\u n v') = n(y\V). i 

It is immediate that U and V are (probabilistically) independent iff they are independent 
conditional on W. 

The generalization to conditional plausibility measures (and hence to all other represen- 
tations of uncertainty that we have been considering) is straightforward. 

Definition 4.4: Given a cps ( W, T' , PI) , U, V € T are plausibilistically independent 
given V € T (with respect to the cpm PI), written Ip\(U, V\V), if V Pi V E J 7 ' implies 
Pl(U\V n V) = P\{U\V) and U n V e T' implies Pl(V\U n V) = Pl(V\V). I 

We are interested in conditional independence of random variables as well as in con- 
ditional independence of events. All the standard definitions extend to plausibility in a 
straightforward way. A random variable X on W is a function from W to the reals. Let 
1Z(X) be the set of possible values for X (that is, the set of values over which X ranges). As 
usual, X = x is the event {w : X(w) = x}. If X = {X±, . . . , X^} is a set of random variables 
and x = (xi, . . . , Xk), let X = x be an abbreviation for the event X\ = x\ Pi . . . (~l Xk Xk- 
A random variable is measurable with respect to cps (W,J-,J-',Pl) if X = x € T for all 
x G TZ(X). For the rest of the paper, I assume that all random variables X are measur- 
able and that 1Z(X) is finite for all random variables X. Random variables X and Y are 
independent with respect to plausibility measure PI if the events X = x and Y = y are 
independent for all x € 1Z(X) and y G TliY). More generally, given sets X, Y, and Z of 
random variables, X and Y are plausibilistically independent given Z (with respect to PI), 
denoted ipj(X, Y|Z), if /pi(X = x, Y = x|Z = z) for all x, y, and z. (Note that I am using 
ipi for conditional independence of events and Jpj for conditional independence of random 
variables.) If Z = 0, then Jpj(X,Y|Z) if X and Y are unconditionally independent, that 
is, if ipi(X = x, Y = x| W) for all x, y; if either X = or Y = 0, then Iff (X, Y|Z) is taken 
to be vacuously true. 

Now consider the following four properties of random variables, called the semi-graphoid 
properties [Pearl 1988], where X, Y, and Z are pairwise disjoint sets of variables. 



CIRV1. If igf(X,Y|Z) then Iff (Y,X|Z). 
CIRV2. If 7™(X,YU Y'\Z) then /™(X, Y|Z). 

5. Another property of probabilistic independence is that if U is independent of V then U i s ind ependent 



of V. This too does not follow for the other representations of uncertainty, and Walley [1991 actually 
makes this part of his definition. Adding this requirement would not affect any of the results here, 
although it would make the proofs somewhat lengthier, so I have not made it part of the definition. 
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CIRV3. If Ip\(X,Y U Y'|Z) then /^(X, Y|Y' U Z). 

CIRV4. If /^(X,Y|Z) and /£f (X, Y'| Y U Z) then I#(X, Y U Y'|Z). 

It is well known that CIRV1-4 hold for probability measures. The following result 
generalizes this. The proof is not difficult, although care must be taken to show that the 
result depends only on the properties of algebraic cpms. 

Theorem 4.5: CIRV1-4 hold for all algebraic cps's. 

Proof: See the appendix. | 



Theorem 4.5, of course, is very dependent on the definition of conditional independence 
given here. Other notions of independence have been studied in the literature for specific 
representations of uncertainty. Perhaps the most common defines tries to generalize the 
observation that if U and V are probabilistically independent, then fi(UDV) = fi(U) x n(V). 



Zadeh [1978] considered this approach in the context of possibility measures, calling it 



noninteractivity, but it clearly makes sense for any algebraic cpm. 

Definition 4.6: U and V do not interact given V (with respect to the algebraic cpm PI), 
denoted W P1 (E7, V\V) if V G T' implies that P1(J7 n V\V) = P1(Z7|V") ® PUT) O-Q ■ 



Fonck [ 1994 1 shows that noninteraction is strictly weaker than independence for a num- 
ber of notions of independence for possibility measures. The following result shows that 
noninteraction implies independence for all algebraic cpms. 

Lemma 4.7: If (W, T , J* , PI) is an algebraic cps, then I Pl (U,V\V) implies NI Pl (U,V\V). 

Proof: Suppose that V G T' and I m (U,V\V) holds. If V D V G T' then, from Alg2, it 
follows that 

Pi(c/ n v\v') = ¥\{u\v n v') ® m{v\V) = v\(u\v') ® p\(v\v'). 

On the other hand, if VnV <£ J 7 ', then by Acc4, P1(V \V) = _L. By CP13, Pl(UC\V\V') = _L, 



and by Lemma gj, Pl(U\V) ® Pl(V\V) = _L. Thus, Pl(U n V\V) = P\(U\V) ® P\{V\V). 
I 



What about the converse to Lemma |4.7| ? The results of Fonck show that it does not hold 
in general — indeed, it does not hold for Poss(?7|y). So what is required for noninteractivity 
to imply independence? The following lemma provides a sufficient condition. 

Lemma 4.8: If(W, , PI) is a standard algebraic cps that satisfies Alg4' , then NIp\(U, V| V') 
implies Ipi(U,V\V). 



6. Shenoy ]l994| ] defines a notion similar in spirit to noninteractivity for random variables. 
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Proof: Suppose that V D V G T' and NI Pl (U, V\V). Then by Alg2, 

Pi(c/ n v\V) = Pi(u\v n v') ® Pi(V|V')- ( 2 ) 

By Acc3, V G J 7 ', so NI Pl (U,V\V) implies 

Pl([/ n V|F') = Pl(U\V) ® Pi(v|y'). (3) 

Since V n V 7 G JF' and (W, JF, JF 7 , PI) is standard, P1(V n V) ^ 1. Since P1(F n F 7 ) = 
P1(V|V) ® Pl(V'), it follows from Lemma |J that P1(V|V) / _L So, by Alg4', @, 
and ©, it follows that Pl(f7|y n V) = P\{U\V). An identical argument shows that 

pi(v|t/ n V) = p\{v\V) aunv'e t'. Thus, i P i(u, v\V). i 



Lemmas [4.7| and [L^ show why noninteractivity and independence coincide for condi- 
tional probability defined from unconditional probability, ranking functions, and possibility 
measures using Poss(J7||V r ). Moreover, they suggest why they do not coincide in general. 
Since neither Poss(£/|V) nor PLp satisfy Alg4', it is perhaps not surprising that in neither 
case does noninteractivity imply conditional independence. (We shall shortly see an ex- 
ample in the case of PI^d; Fonck fl994| gives examples in the case of Poss(i7|V).) Indeed, 



noninteractivity may not even imply conditional independence for an arbitrary conditional 
probability measure, as the following example shows. 

Example 4.9: Suppose that W = {a, b}, T = 2 W , T' = T - {0}, n(a) = 1, = 0, but 
fJ<(b\b) = 1. It is easy to see that {b} is not independent of itself, but {6} does not interact 
with {&}, since fj,(b) = (j,(b) x Nevertheless, it is not hard to check that this conditional 
probability measure fi is algebraic and, in fact, satisfies Alg4". However, it is not standard, 
since {b} £ T' although ji{b) = 0. I 



It is easy to see that the assumption of standardness is necessary in Lemma |4.8| , For 
suppose that (W, J-', PI) is an arbitrary nonstandard algebraic cps for which T / 1. 
Since (W, T, J 7 ', PI) is nonstandard, there must exist some U G J 7 ' such that P1([/|W) = _L. 
But then 

1 = Pl(0|W r ) = P1(0|C/) ® Pl(U\W) = ± ® T. 

Thus 

Pl(U\W) = ± = ± (8) ± = P1(U\W) (8) Pl(U\W), 

so NIpi(U, U\W). But P\{U\U) = T/1 = P1(C7), so I Pi (U, U\W) does not hold. 

In general, Theorem [lj^ does not hold if we use NIp\ rather than Ip\. Besides noninter- 
activity, a number of different approaches to defining independence for possibility measures 



HCampos and Huete 1999a ; Campos and Huete 1999b ; Dubois, Farinas del Cerro, Hcrzig, 



and Prade 1994; Fonck 1994] and for sets of probability measures [Campos and Huete 1993 



Campos and Moral 1995; Cousa, Moral, and Walley 1999| have been considered. In gen- 



eral, Theorem L5 does not hold for them either. It is beyond the scope of this paper to 
discuss and compare these approaches to that considered here, but it is instructive to con- 
sider independence for sets of probability measures in a little more detail, especially for the 
representation Pl-p. 

Ip\ P is very close to a notion called type-1 independence considered by de Campos and 



Moral [ 1995| ]. U and V are type-1 independent conditional on V with respect to V if U 
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and V are independent conditional on V 1 with respect to every fj, G V- It is easy to check 
that Ip\ v (U, V\V) implies that U and V are type-1 independent conditional on V (and 
similarly for random variables); however, the converse does not necessarily hold, because 
the two approaches treat conditioning on events that have probability according to some 
(but not all) of the measures in V differently. To see this, consider an example discussed 
by de Campos and Moral. Suppose a coin is known to be either double-headed or double- 
tailed and is tossed twice. This can be represented by V = {/io,/^i}, where ^(hh) = 1 
and fJ>o(ht) = no(th) = n®{tt) = 0, while Hi(tt) = 1 and fii(ht) = fii(th) = fi(hh) = 0. 
Let X\ and X2 be the random variables representing the outcome of the first and second 
coin tosses, respectively. Clearly there is a functional dependence between X\ and X2, but 
it is easy to check that X\ and X2 are type-1 independent with respect to V . Moreover, 
noninteractivity holds: Xl-p\{X\ = i,X2 = j) holds for i,j € {h,t}. On the other hand, 
Ipi T (X 1 ,X 2 ) does not hold. For example, /x 1= a(1) = while f Xl =h\x 2 =h{ 1 ) = *•[] 



5. Bayesian Networks 



Throughout this section, I assume that we start with a set W of possible worlds characterized 
by a set X = {X±, . . . , X n } of n binary random variables. That is, a world in If is a tuple 
(x±, . . . , x n ) with Xi £ {0,1}, and Xi(x±, . . . ,x n ) = xf, that is, the value of Xj in world 
w = (x\, . . . ,x n ) = The goal of this section is to show that many of the tools of 

Bayesian network technology can be applied in this setting. The proofs of the main results 
all proceed in essentially the same spirit as well-known results for probabilistic Bayesian 
networks (see [Geiger and Pearl 1988; Geiger, Verma, and Pearl 1990| ; Verma 1986fl ). 



5.1 Qualitative Bayesian Networks 

As usual, a (qualitative) Bayesian network (over X ) is a dag whose nodes are labeled by 
variables in X. The standard notion of a Bayesian network representing a probability 
measure Pearl 1988 can be generalized in the obvious way to plausibility. 



Definition 5.1: Given a qualitative Bayesian network G, let Pare (AT) be the parents of the 
random variable X in G; let Desc(^) be all the descendants of X, that is, X and all those 
nodes Y such that X is an ancestor of Y; let NDg(X), the nondescendants of X, consist 
of X — Desc(^)- Note that all ancestors of X are nondescendants of X. The Bayesian 
network G is compatible with the cps (W, J-, J-', PI) (or just compatible with PI, if the other 

7. As Peter Walley [private communication, 2000] points out, this example is somewhat misleading. The 
definition of independence with respect to Pl^ produces the same counterintuitive behavior as type-1 
independence if the probabilities are modified slightly so as to make them positive, i.e., when there 
is "almost functional dependence" between the two variables. For example, suppose that the coin in 
the example is known to either land heads with probability .99 or .01 (rather than 1 and 0, as in the 
example). Let fi' and fi'x be the obvious modifications of /io and hi required to represent this situation, 
and let V 1 = {^0,^1}- If is easy to check that X\ and X2 continue to be type-1 independent, and 
noninteractivity continues to hold, but now 7pj / (Xi,X2) also holds. The real problem is that this 

representation of uncertainty does not enable learning. 

8. The assumption that the random variables are binary is just for ease of exposition. It is easy to generalize 
the results to the case where IZ(Xi) is finite for each Xi. 
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components of the cps are clear from context) if Ipj (X, NDcr(X)|Par(X)), that is, if X is 
conditionally independent of its nondescendants given its parents, for all X £ X. | 

There is a standard way of constructing a Bayesian network that represents a proba- 



bility measure [Pearl 1988 1. I briefly review the construction here, since it works without 
change for an algebraic cpm. Given an algebraic cpm PI, let Y\,...,Y n be a permuta- 
tion of the random variables in X. Construct a qualitative Bayesian network G-p\^ x ^..y n ) 
as follows: For each k, find a minimal subset of {Y\, . . . ,Y k _{\, call it P&, such that 
Jpj({Y"i, . . . , Yfc|Pfc). Then add edges from each of the nodes in ~P k to Y k . Verma 



[|1986 | shows that this construction gives a Bayesian network that is compatible with PI in 



the case that PI is a probability measure; his proof depends only on CIRV1-4. Thus, the 
construction works for algebraic cpms. 

Theorem 5.2: Gp^y^, y n ) ^ s compatible with PI. 

Proof: For ease of notation in the proof, I write G instead of Gpi i (y lv .. i y n )- Note that 
Yi,...,Y n represents a topological sort of G; edges always go from nodes in {Yi , . . . , Yfe-i} 
to Yj.. It follows that G is acyclic; i.e., it is a dag. The construction guarantees that 
Pfc = Parc^Ffc) and that Ipj ({Yi, . . . , Y/-_i}, Y k \PaiG(Yk))- It follows from results of [ [Verma 



1986| (and is not hard to verify directly) that I^(NB G (Y k ), Y fc |Par G (Y fc )) can be proved 



using only CIRV1-4. The result now follows from Theorem 14. 5L I 



5.2 Quantitative Bayesian Networks 

A qualitative Bayesian network G gives qualitative information about dependence and inde- 
pendence, but does not actually give the values of the conditional plausibilities. To provide 
the more quantitative information, we associate with each node X in G a conditional plau- 
sibility table (cpt) that quantifies the effects of the parents of X on X. A cpt for X gives, 
for each setting of X's parents in G, the plausibility that X = and X = 1 given that 
setting. For example, if X's parents in G are Y and Z, then the cpt for X would have an 
entry denoted dx=i\y=jnz=k f° r an (hjjk) G {0, l} 3 . As the notation is meant to suggest, 
dx=i\y=jnz=k = Pl(X = i\Y = j Cl Z = k) for the plausibility measure PI represented by 
G.f\ For each fixed j and k, we assume that xqj^ © x\jk = T. A quantitative Bayesian 
network is a pair (G, f) consisting of a qualitative Bayesian network G and a function / 
that associates with each node linGa cpt for X. 

Definition 5.3: A quantitative Bayesian network (G,f) represents PI if G is compatible 
with PI and the cpts agree with PI, in the sense that, for each random variable X, the entry 
d x =i\Yr=h,...,Y k =j k in the c Pt is P1 (^ = = Jin. • -nYfe = j k ) if Yi = jifl. . .nY k = j k € T'. 
(It does not matter what d x =i\Y 1 =j 1 ,...,Yk=jk is if Yl = ii fl . . . fl Yfc = jk £ F' ■) I 

Given a cpm PI, it is easy to construct a quantitative Bayesian network (G, /) that 
represents PI: simply construct G that is compatible with PI as in Theorem and define 
/ appropriately, using PI. The more interesting question is whether there is a unique 

9. Of course, if the random variables are not binary, i,j, k have to range over all possible values for the 
random variables. 



17 



algebraic cpm determined by a quantitative Bayesian network. As stated, this question is 
somewhat undetermined. The numbers in a quantitative network do not say what ffi and 
ffi ought to be for the algebraic cpm. 

A reasonable way to make the question more interesting is the following. Recall that, 
for the purposes of this section, I have taken W to consist of the 2 n worlds characterized 
by the n binary random variables in X. Let V£,d,®® consist of all cps's of the form 
(W, J 7 , J 7 ', PI), where A more interesting question perhaps is to consider the set VCx>,opius,® 
of all standard algebraic cps's of the form (W, J 7 , J 7 ', PI), where J 7 = 2 W , so that all subsets 
of W are measurable, the range of PI is D and PI is algebraic with respect to ffi and 
ffi. Thus, for example, "P£jv*,min,+ consists of all conditional ranking functions on W 
defined from unconditional ranking functions by the construction in Section |2[ Since a 
cps (W, J 7 , J 7 ', PI) G PCd,®,® is determined by PI, I often abuse notation and write PI G 
VCd,®,®- 

With this notation, the question becomes whether a quantitative Bayesian network 
(G, f) such that the entries in the cpts are in D determines a unique element in VCd,@,<&- As 
I now show, the answer is yes, provided (D, ©, ffi) satisfies some conditions. Characterizing 
the conditions on (-D, ffi, ffi) required for this result turns out to be a little subtle. Indeed, 
it is somewhat surprising how many assumptions are required to reproduce the simple 
arguments that are required in the case of probability. 

Definition 5.4: (.D, ffi, ffi) is a BN- compatible domain (with respect to VCd,®.®) if there 
are sets .D(ffi) CDxD and -D(ffi) Cflu D 2 U D 3 U . . . satisfying the following properties: 

BN1. ffi and ffi are commutative and associative. 

BN2. For all al G D, (T,d),(_L,d) G D(ffi), (_L,d) G D(ffi), T ffi d = d, 1 ffi d = _L, and 
_L ffi d = d. 

BN3. ffi distributes over ffi; more precisely, a ffi (&i ffi • • • ffi b n ) = (a ffi b±) ffi • • • © (a ffi 6 n ) if 
(a,6i),. . . , (a,6 n ), (a,&iffi- • •©>&„) G £>(ffi) and (bi,.. .,&„), (affifei,. . . ,affi6 n ) G £>(©); 
moreover, (ai ffi • • • ffia n ) ffi 6 = oj ffi 6 ffi • • • ffi a n ffi b if (oi, . . . , a n ), (ai ffi 6, . . . , a n ffi b) G 
£>(©) and (ai ffi • • • ffi a n , b), (ai,b), . . . , (a n , b) G 

BN4. If (a, c), (6, c) G L>(ffi), a ffi c < b ffi c, and c^l, then a < 6. 

BN5. If (di, . . . , d fc ) G -D(ffi) and di ffi • • • ffi d k < d, then there exists (d[, . . . , d' k ) G -D(ffi) 
such that (d[,d),..., (d' k ,d), (di ffi • • • ffi d'^, d) G D(ffi), d» = d\ ffi d, for i = 1, . . . , k, 
and di ffi • • ■ d k = (d[ ffi • • • d' fc ) ffi d. 

BN6. Z?(©) is closed under permutations and prefixes, so that if (x\, . . . ,Xk ) G D(ffi) and 
7r is a permutation of (1, ...,k), then (x^-m, . . . , a^a)) G -D(ffi) and if k' < k, then 
(xi, . . . ,x k r) G -D(ffi); moreover D(ffi) D D. 

BN7. If (di,...,djt),(di,...,d^) G D(e), (di,<^-) G D(ffi) for i = 1, . . . , k, j = 1, . . . ,m, 
then (di ffi d' l5 . . . , d\ ffi d' m , . . . , dp. ffi d[, . . . , d& ffi d^J G D(ffi). 

BN8. If (di, . . . , d fc ) G -D(ffi) and k' < k, then di ffi • • • ffi d k > < d\ ffi • • • ffi d k . 
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Note that all the representations of uncertainty we have considered so far have associated 
with them BN-compatible domains. Indeed, the definitions of D (©), Z?((g)), ffi, and © in 



each case are given in the proof of Proposition 3J. For example, for 'P£[o,i],max,mim the 
set of conditional possibility measures determined by unconditional possibility measures, 
£>(©) = [0, 1] x [0, 1], while D(<g)) consists of all pairs (a, b) G [0, 1] x [0, 1] such that a < b 
or a = 1. I leave it to the reader to check that, in all these cases, BN1-8 hold. 

Given a tuple x =(%,... ,x n ) G [0, 1]™, let d Xi ,G,x denote the value d x . =x .\ P3J . G ( Xi )=y, 
where y is the restriction of x to the variables in Parc(Xj). 

Definition 5.5: A quantitative Bayesian network (G, /) is representable if the values of 
the cpts for G lie in a BN-compatible domain (_D, ®, ©) and the following properties hold: 

Rl. For every node X in G and every setting y of Par G (X), (dx=o|Par G (X)= y , d x=i|Par G (x)= y ) G 
Dom((B) and 

^X=0|Par G (X)=y © ^X=l|Par G (X)=y = T. 

R2. Suppose Y\, . . . ,Y n is a topological sort of the nodes in G. Then for all y G {0, l} n 

and all 1 < j < k < n, (d Yj ,G,y, dY j+1 ,G,y ® • • • <8> <%,G,y) 6 £>(©) and (dy^Cy <8> • • • ® 

Rl is the obvious analogue of the requirement in the probabilistic case that the entries 
of the cpt for X, for a fixed setting of X's parents, add up to 1. R2 essentially says that 
certain terms (the ones required to compute the plausibility of Y = y for Y = (Yi, . . . ,Y n )) 
are required to be in D(<g>), so that it makes sense to take their product. Since D((g)) = 
[0, 1] x [0, 1] in the case of probability, there is no need to make this requirement explicit. 
However, it is necessary for other representations of uncertainty. 

The following result shows that, as the name suggests, there is a unique cpm that 
represents a representable quantitative Bayesian network. 

Theorem 5.6: // (G,f) is representable and the values of the cpts for G lie in a BN- 
compatible domain (D,©,©), then there is a unique cpm PI € VCd,®,® such that (G, f) 
represents PI. 

5.3 D-Separation 

Just as in the case of probability, conditional independencies can be read off the Bayesian 



network using the criterion of d-separation [ Pearl 1988 1. Recall that a set X of nodes 



in G = (V,E), is d-separated from a set Y of nodes by a set Z of nodes in G, written 
d-sepQ(X., Y|Z), if, for every X € X, Y € Y, and a trail from X to Y (that is, a sequence 
(Xq, . . . , Xk) of nodes in G such that Xq = X, X^ = Y and either (Xi,Xi + i) or (JQ+i, X{) 
is a directed edge in G) and a node X{ on the trail with < i < k such that either: 

(a) Xi G Z and there is an arrow leading into X{ and an arrow leading out (i.e., either 
(Xi-i, Xi), (Xi, Xi + i) G E or (Xj,Xj_i), (Xj +1 ,Xj) G E 

(b) Xi G Z and -Xj is a tail-to-tail node (i.e., (JQ, JQ_i), (JQ,Xj + i) G i£) 

(c) Xj is a head to head node (i.e., (X_i,Xj), (Xj + i,Xj) G E), and neither Xj nor any 
of its descendants are in Z. 
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Let £g,pi consist of all statements of the form Ipj (X, ND G (X)|ParG(X)). Let VCd,®,® 
be an arbitrary collection of cps's of the form (W, J 7 , J 7 ', PI) where all components other 
than PI are fixed, and the plausibility measures PI all have the same range D of plausibility 
values. Consider the following three statements: 

1. d-sep G (X, Y|Z). 

2. /™(X, Y|Z) is provable from CIRV1-4 and E G|P1 . 

3. Ip|(X, Y|Z) holds for every plausibility measure in VCd,®,® compatible with G. 
The implication from 1 to 2 is proved in [ Geiger, Verma, and Pearl 1990| ; Verma 1986]. 



Theorem 5.7: | Geiger, Verma, and Pearl 1990 ; Verma 1986 If d-sep G (K, Y\Z), then 



I£f(X,Y|Z) is provable from CIRV1~4 and S G>P1 . 



It is immediate from Theorem 4.5 that the implication from 2 to 3 holds for algebraic 
cpms. 

Corollary 5.8: If i$(X, Y|Z) is provable from CIRV1-4 and E G;Ph then /™(X,Y|Z) 
holds for every algebraic cpm PI compatible with G. 



Finally, the implication from 3 to 1 for probability measures is proved in [Geiger and 



Pearl 1988 ; Geiger, Verma, and Pearl 199Cf| . Here I generalize the proof to algebraic plausi- 



bility measures. Notice that to prove the implication from 3 to 1, it suffices to show that if 
X is not d-separated from Y by Z in G, then there is a plausibility measure PI S "P£_d,®,® 
such that Ip{(X,Y\Z) does not hold. To guarantee that such a plausibility measure exists 
in VCd,®,®, we have to ensure that there are "enough" plausibility measures in VCd,®,® 
in the following technical sense. 

Definition 5.9: A BN-compatible domain (JD, ©, ®) is rich if there exist d, d! £ D such 
that (1) (d, d') £ D((B), (2) d @ d! = T and (3) if x = x\ ® . . . ® x^, where each Xi is either 
d or d! and k < n, then (d,x), (x,d), (d',x), and (x,d') are all in D((g>) (intuitively, D((g)) 
contains all products involving d and d' of length at most n). I 

All the domains for the cps's we have considered are easily seen to be rich. 

Theorem 5.10: Suppose that plausibility measures in VCd,®,® take values in a rich BN- 
compatible domain. Then if Ip( (X, Y|Z) holds for every plausibility measure in VCd,®,® 
compatible with G, then d-sep G (X., Y|Z). 

I remark that independence and d-separation for various approaches to representing sets 
of probability measures using Bayesian networks are discussed by Cozman [ 2000b , |2000a |. 



However, the technical details are quite different from the approach taken here. 
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6. Conclusion 



I have considered a general notion of conditional plausibility that generalizes all other stan- 
dard notions of conditioning in the literature, and examined various requirements that 
could be imposed on conditional plausibility. One set of requirements, those that lead to 
algebraic cps's, was shown to suffice for the construction of Bayesian networks. Further 
assuming that the range D of the plausibility measure is a BN-compatible domain suffices 
for all the more quantitative properties of Bayesian networks to hold and for d-separation 
to characterize the independencies. It should also be clear that standard constructions like 



belief propagation in Bayesian networks fPearl 1988| can also be applied to algebraic cps's 



with ranges that are BN-compatible, since they typically use only basic properties of con- 
ditioning, addition, and multiplication, all of which hold in BN-compatible domains (using 
© and ®). In particular, these results apply to sets to probability measures, provided that 
they are appropriately represented as plausibility measures. The particular representation 
of sets of probability measures advocated in this paper was also shown to have a number of 
other attractive properties. 

Appendix A. Proofs 



In this section I give the proofs of Theorems 4.5, 5.6, and 5.10. I repeat the statement of 
the results for the convenience of the reader. 

Lemma A.l: Suppose that (W, T 1 , PI) is a cps, A\,..., A n is a partition ofW , X,A\,..., A n G 
T , and Y E T' . Then 

Pl(X\Y) = ® {i :A i nYe^'}PKX\A i n Y) ® Pl{A^Y)^\ 

Proof: Using an easy induction argument, it follows from Algl that 

pi(x|y) = ejLjPipr n a^y). 

If A t HY <£ J 7 ', then it follows from Acc4 that P1(A;|Y) = _L. Thus, by CP13, Pl(Xn A t \Y) = 



_L. Using Lemma ^2, it follows that 

Pl(x\Y) = ® {i:AinYe ^ } Pl(X n A\Y). 

If Ai n Y £ T\ then it follows from Alg2 that P\{X D Ai\Y) = ¥\{X\Ai D Y) ® Pl(Ai\Y). 
Thus, 

p\(x\Y) = ® {i ._ AinYeT>} Pl(x\A n Y) <g> Pl(AI^), 

as desired. I 



Theorem |4.5| : CIRV1-4 hold for all algebraic cps's. 

Proof: CIRV1 is immediate from the fact that independence is symmetric. 

10. Notice that if Ai n Y g F ', then ¥\{X\Ai nF)® Pl(Ai\Y} = Pl(X n Ai\Y) by Alg2. Thus, the terms 
arising on the right-hand side of the equation in Lemma [A.l| are in Dom(©). This means that there is 
no need to put in parentheses; © is associative on terms in _Dom(©). 
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For CIRV2, suppose that 7gf(X, Y U Y'|Z). We must show I$(X,Y|Z). That is, we 
must show that Ipi(X = x, Y = y|Z = z), for all x, y, and z. This requires showing two 
things. 

2(a). If X = x n Z = z G J 7 ', then 

P1(Y = y|X = x n Z = z) = P1(Y = y|Z = z). 

2(b). IfY = ynZ = zG J 7 ', then 

P1(X = x|Y = y n Z = z) = P1(X = x|Z = z). 

For 2(a), suppose that P1(X = x n Z = z) G J 7 '. From 7 P1 (X, Y U Y'|Z), it follows that 
Jpi(X = x, Y = y n Y' = y'|Z = z) for all y'. Hence 

P1(Y = y n Y' = y'|X = x n Z = z) = P1(Y = yflY' = y'|Z = z) (4) 

for all y' G K(Y'). From (|) it follows that 

e y /Pl(Y = y n Y' = y'|X = x n Z = z) = ©y/Pl(Y = y n Y' = y'|Z = z). 

Thus, 

Pl(U y /Y = y n Y' = y'|X = x n Z = z) = Pl(U y /Y = y n Y' = y'|Z = z). 

Since U y ,(Y = y nY' = y')=Y = y, 2(a) holds. 

For 2(b), from I$(X, Y U Y'|Z), it follows that ifY = ynY' = y'nZ = zG J 7 ', then 

P1(X = x|Y = y n Y' = y' n Z = z) = P1(X = x|Z = z). (5) 

From (||) and Lemma A.l| , it follows that 



Pl(X = x|Y = ynZ = z) 

{y , :Y=ynY , =y , nZ=ze ^ } Pl(X = x|Y = y n Y' = y' n Z = z) (g> P1(Y' = y'|Y = y n Z = z) 
©{y':Y=ynY'=y'nZ=z 6 ^}Pl(X = x|Z = z) ® P1(Y = y'| Y = y n Z = z). 

(6) 

By Acc4, it follows that if Y = y n Y' = y' n Z = z £ J 7 ', then P1(Y' = y'|Y = y n 



Z = z) = _L. Thus, by Lemma ^2] Algl, CP12, and CP14, 

©{y'^ynY-y'r^ze^'lPKY' = y'|Y = y n Z = z) 

(7) 



y /Pl(Y' = y'|Y = ynZ = z) 



= P1(W|Y = y n Z = z) 
= T. 

The next step is to apply distributivity (Alg3) to the last line of (H). To do this, we 
must show that certain tuples are in Dom(©) and Dom{®), respectively. Since 

(P1(X = X |Y = y n Y' = y' n Z = z), P1(Y' = y'[Y = y n Z = z) G Dom(®), 
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from (||) it follows that 

(P1(X = x|Z = z), P1(Y' = y'|Y = y n Z = z)) € Dom{®). 
If {y^ , . . • ,y-J = {y' G ft(Y') :Y = ynY' = y'nZ = zS J 7 '}, then clearly 

(P1(Y = y' h \Y = y n Z = z), . . . , P1(Y =y|jY = ynZ = z)e Dom(0). 
Moreover, using (||) again and Alg2, it follows that 

P1(X = x|Z = z) ® P1(Y = y-JY = y n Z = z) = P1(X = x n Y' = y -JY = y n Z = z). 

Thus, (P1(X = x|Z = z)®Pl(Y' = y - x |Y = y nZ = z), . . . , P1(X = x|Z = z)<g>Pl(Y' = y' ik \Y 
Z = z) G Dom(©). Finally, since ([?]) shows that ®{ y ' : Y= y nY'= y 'nz=ze:F'} = ~T an d 5 by the 
proof of Lemma |3.3| , (d, T) G Dom{®) for all d G -Range(Pl), it follows that 

(P1(X|Z = z), © {y / :Y=y nY'=y'nz=z e ^}Pl(Y / = y'|Y = y n Z = z)) G Dom(®). 

It now follows, using Alg3, (0), and Lemma |Q| , that 

e {y / :Y=y nY'=y'nz=z G ^'}Pl(X = x|Z = z) <g> P1(Y' = y'|Y = y n Z = z) 
= P1(X = x|Z = z) ® (0 {y , :Y=y n Y '=y'nz=z e ^}Pl(Y / = y'|Y = y D Z = z)) 
= Pl(X = x|Z = z)(g)T 
= Pl(X = x|Z = z). 

Thus, from ©, it follows that P1(X = x[Y = y n Z = z) = P1(X = x|Z = z). This com- 
pletes the proof of 2(b) and CIRV2. 

For CIRV3, suppose that J£?(X, Y U Y'|Z). We must show that I£J(X, Y|Y'UZ). This 
again requires showing two things: 

3(a). IfX = xnY'=y'nZ = zGf, then 

P1(Y = y |X = x n Y' = y' n Z = z) = P1(Y = y|Y' = y D Z = z). 

3(b). IfY = ynY' = y'nZ = zGf', then 

P1(X = x| Y = y n Y' = y' n Z = z) = P1(X = x|Y' = y' D Z = z). 

For 3(a), suppose that X = xnY' = y'nZ = z G T' . Thus, by Acc3, X = xnZ 
Since /^(X, Y U Y'jZ), it follows that 

P1(Y = y" n Y' = y'|X = x n Z = z) = P1(Y = y" HY' = y'|Z = z) 

for all y" G 7£(Y). Applying Alg2 to each side of (||), it follows that 

P1(Y = y|Y' = y'nX = xnZ = z)® P1(Y' = y'|X = x n Z = z) 
= P1(Y = y|Y' = y' n Z = z) (g) P1(Y' = y'|Z = z). 



23 



Thus, to prove 3(a), it follows from Alg4 that it suffices to show that 

P1(Y' = y'|X = x n Z = z) = P1(Y' = y'|Z = z) / _L. 

But by (||) and Algl, it follows that 

Pl(Y' = y'|X = xnZ = z) 
= ©y" 6 7*(Y)Pl(Y = y" n Y' = y'|X = x n Z = z) 
= © y » e ^(Y)Pl(Y = y" n Y' = y'|Z = z) 
= Pl(Y' = y'|Z = z), 

as desired. Moreover, since X = xflY' = y'nZ = z G T' , it follows from Acc4 that 
P1(Y' = y'|Z = z) ^ _L 

For 3(b), suppose that Y = ynY' = y'nZ = zef . Since /^(X, YU Y'|Z), it follows 
that 

P1(X = x|Y = y n Y' = y' n Z = z) = P1(X = x|Z = z). 
Thus, to prove 3(b), it suffices to show that 

P1(X = x|Y' = y' n Z = z) = P1(X = x|Z = z). (9) 

Recall that we are assuming that ^(X,YUY'[Z). By CIRV2, it follows that /^(X, Y'|Z). 
Thus, @ is immediate from 2(b) (since Y = yflY' = y'flZ = z G T' implies that Y' = y'n 
Z = zef). 

Finally, consider CIRV4. Suppose that i£J(X, Y|Z) and I^(X, Y'|Y U Z). We must 
show that Jpj(X, Y U Y'|Z). As usual, this requires showing two things: 

4(a). IfY = ynY'=y'nZ = zGf, then 

P1(X = x|Y = y n Y' = y' n Z = z) = P1(X = x|Z = z). 

4(b). IfX = xnZ = zef, then 

P1(Y = y n Y' = y'|X = x n Z = z) = P1(Y = y n Y' = y'jZ = z). 

Both 4(a) and 4(b) are straightforward. For 4(a), suppose that Y = yflY' = y'flZ = z G 
T' . Since I^(X, Y'|Y U Z), it follows that 

P1(X = x|Y = y n Y' = y' n Z = z) = P1(X = x|Y = y n Z = z). 

And since /^(X, Y|Z), it follows that 

P1(X = x|Y = y n Z = z) = P1(X = x|Z = z). 

Thus we have 4(a). 

For 4(b), suppose that X = xflZ = z G T' . There are now two cases to consider. If 
P1(Y = y|X = xnZ = z)/l then, by Acc4, X = xnY = ynZ = z£ Moreover, by 
Alg2, 

P1(Y = ynY' = y'|X = xnZ = z) = P1(Y' = y'|X = xnY = y nZ = z)<8>Pl(Y = y|X = xnZ 

(10) 
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Since i£J(X, Y'|Y U Z), it follows that 

P1(Y' = y'|X = x nY = ynZ = z)= P1(Y' = y'|Y = y n Z = z). 

And since I^(X, Y|Z), it follows that P1(Y = y|X = xnZ = z) = P1(Y = y|Z = z). Plug- 
ging this into (|lO| ) and applying Alg2 again gives 

P1(Y = y n Y' = y'[X = x n Z = z) 
= Pl(Y' = y'|Y = ynZ = z)(g>Pl(Y = y|Z = z) 
= P1(Y = y n Y' = y'[Z = z), 

as desired. 

Now if P1(Y = y|X = x n Z = z) = _L, then by CP13, it follows that P1(Y = y n 
Y' = y'|X = xnZ = z) = _L. Moreover, since i£5f(X, Y|Z), it follows that P1(Y = yjZ = z) = 
_L. Applying CP13, we get that P1(Y = y n Y' = y'jZ = z) = _L. Thus, again 4(b) holds. | 



Theorem 5^: If (G,f) is representable and the values of the opts for G lie in a BN- 
compatible domain (D,©,®) ; then there is a unique cpm PI € VCd,®,® such that (G,f) 
represents PI. 

Proof: Given (G, /), suppose without loss of generality that X = (Xi, . . . , X n ) is a topo- 
logical sort of the nodes in G. I now define the plasubility measure PI determined by (G, /). 
I start by defining PLg n on sets of the form X = x. 

It easily follows from Alg2 that if PI G VCd,<&,® and Pl(Xi = x\ f] . . . f)X n -i = x n -i) 7^ 
_L, then 

pi(x = x) = Pi(x n =x n \Xx =x 1 n...nx n -i = x n -i)® 

Pi(x n _i = xn-ilXx = xi n . . . n x n _ 2 = x n - 2 )® (11) 

• • • ® P1(X 2 = x 2 \X 1 = x x ) ® Pl(Xi = Xl ). 

(Since (g> in D is assumed to be associative, no parentheses are required here. However, even 
without this assumption, it follows easily from Alg2 that ® is in fact associative on tuples 
(a,b,c) of the form (Pl([/i \U 2 ), Pl{U 2 \U 3 ), Pl(U 3 \U 4 )), where Ui Q U 2 Q U 3 Q U 4 , which are 
the only types of tuples that arise in ([Tl|). Associativity will be more of an issue below.) 
If PI is compatible to be with G, then in fact 

Pl(X = x) = Pl(X n = x n \n XjePl , IGi x n )X j = Xj )<S> 

Pl(X n _! = x n -i\ n Xj eP3x G (x n -i) x j = Xj)® ( 12 ) 
■■■®(X 1 = xi). 

(If Par G (A fc ) = 0, then Pl(X k = x k \n Xj ePav G (x k ) x j = Xj) is just taken to be Pl(X k = x k ).) 
It is clear from (|l2|) that PLvjj)(X = x) must be dx n ,G,x <8> • • • ® dxx,G,x- 
Note that every subset of W can be written as a disjoint union of events of the form 

X = x. Thus, iiU € J 7 , define 

P1 (G,/)(f / ) = ©{x:X=xCf/}dx n ,G,x <8> • • • ® d Xl ,G,x- 
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For conditional plausibilities, suppose that Ph G j\(V) ^ _L, so that V G J 7 '. Let 
{x 1 ,...,x fe } = {x : X = x C V}. It follows easily from BN6, BN7, Rl, and R2 that 
(P1 (GJ) (X = Xl ),...Pl (G)/) (X = x fc )) G £>(©). Thus, by BN8, if X = x C V, then 
Pl(Gj)(X = x) < P1(g, /) (V" ) . By BN5, for each j, there exists dx=x j |v such that 

(dx=x,|y,Pl(G,/)(^)) G D(®) and d x =x,|v ® Pl(G,/)(^) = P1(G,/)(X = x); 

it follows from BN4 that dx=x-|v * s the unique element in D with this property. Moreover, 
by BN5, (dx = K llv ,...,dx=x k \v) G £>(©). Define Pl (GJ) (U\V) = ©{ X :X=xC£/nV}dx=x|v 
(where P1( G f)(@\V) is taken to be _L). Note for future reference that it follows from BN5 
that (Pl (Gj/) VinPl(G,/)(^)) e D(®) and 

P\( G ,f)(U\V) ® P\ GJ )(V) = P\ G ,f)(U n V). (13) 

This completes the definition of Pl( G j). It remains to check that it is an algebraic cpm 
that is represented by (G,f). Thus, we must check that Algl-4 and CPU-4 hold. Algl 
is immediate from the definitions and BN1 and BN2 (BN2 is necessary for the case that 
one of the disjoint sets is empty); Alg3 is immediate from BN3 and Alg4 is immediate 
from BN4. For Alg2, note that if P\ G f) {V) ^ _L and P\ G f )(V) + -L then, by @, 

Pi iGJ) (u n v\V) ® p\ gj) {V) = p\ GJ ){u nynn and 

{P\ GJ )(u\v n V) ® p\ gj) (v\v')) ® Pi (Gl/) (n 
= p\ gj ){u\v n V) ® (Pi (Gi/) (vm ® Pi( G) /)(^)) 
= Pi (G)/) (c/|ynF')®Pi(G,/)(^nn 
= Pi (Gi/ )(i7nvn V)- 

(Note that the associativity of ® is being used here.) Thus, by BN4, 

p\ gj) (u n v\V) = P\ GJ ){u\v n V) ® P\ GJ) {v\V). 

CPU is immediate by definition (the empty sum is taken to be _L). For CP12, note that 
by flTJ), P\ GJ) {W\V) ®P\ GJ) {V) = P\ G J){V). Since T ®P\ GJ) {V) = P\ G ,f){V) by 
BN2, it follows from BN4 that P\ GJ) {W\V) = T. CP13 follows readily from the definitions 
together with BN1, BN6, and BN7. CPM also follows by definition. 

Next we must show that (G,f) represents Plr G j\. The first step is to show that 
pl (G,/)(^ = x\Par G (X) = z) = dx=x|Par G (x)=z- Note that by (0), 

P1 (G)/) (X = x|Par G (X) = z) ® Pl (G>/) (Par G (X) = z) = P\ GJ) (X = x n Par G (A) = z). 

By definition, 

Pl(G,/)(^ = X n Par G (X) = z) = e{x:X=x'C(X=znPar G pO=y)}Pl(G,/)(X = x'). 

Each term in the "sum" on the right is the "product" of terms; indeed, the sum is over all 
possible products that include <ix=j/|Par G (x)=z as one of the terms and a term dy =y |p arG (y)= z ' 
for each Y G Par G (A"), where y is the component of z corresponding to Y. By using BN1, 
BN3, Rl, and R2, it is not hard to show that 

Pl (Gj/) (X = ynPar G (X)=z) 

= ©{ : X=x'C(X=znPar G (X)=y)Pl(G,/)( X = x ( 14 ) 

= d x=x \ PsLTG=z ® Pl( G)/) (Par G (X) = z). 
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It now follows from BN4 that PL G j)(X = x|Par G (X) = z) = dx= x \Pai G (x)=x- 

To show that Pb G j)(X = i[NDg(J) =j/fl Par G (X) = z) = dx=ic|Par G (x)=z> it suffices 
to show that 

P1 (GJ) (X = x n ND G pO = y n Par G (X) = z) 
= <ix=,|Par G (x)=z®Pl(G,/)(ND G (X) = ynPar G (X)=z), 1 > 

for then the result follows by BN5. ( |I5| ) can be shown much like (|14|), but now the commu- 
tativity of ® (BN1) is essential. That is, the expressions for "P\(qj\{X = x D ND G (X) = 
y n Par G (X) = z) and dx=z|Par G pO=z ® Pl (Gi/) (ND G pf ) = y n Par G (X) = z) involve 
the same terms, but not necessarily in the same order. With commutativity, they can be 
permuted so that they are in the same order. 

Similar arguments, which I leave to the reader, show that Pl( G j)(ND G (X) = y\X = 
xnPar G (X) = z) = Pl (Gi/) (ND G (X) = y\Pav G (X) = z). Thus, (GJ) represents P1 (GJ) . | 



Theorem 5.10| : Suppose that (D, ffi, <8>) is a rich BN-compatible domain. Then iflp\ (X, Y|Z) 



holds for every plausibility measure in V£d,®,® compatible with G, then d-sep G (X., Y|Z) 

Proof: Suppose that X is not d-separated from Y by Z in G. Then there is some X E X 
and Y € Y such that X is not d-separated from Y by Z in G. I construct a cpm in 



PI S VCd,®,® such that Ip\(X,Y\Z) does not hold, using the techniques of [Geiger, Verma, 



and Pearl 1 990f1 . 



As shown in |Geiger, Verma, and Pearl 199C , Lemma 9] , if X is not d-separated from Y 



in G, there exists a subgraph G' of G such that 

1. G' includes all the nodes in G but only a subset of the edges in G, 

2. X is not d-separated from Y by Z in G' . 

3. the edges E' in G' consist only of those specified below: 

(a) a trail q from X to Y, 

(b) for every head-to-head node Xi on the trail q, there is a directed path pi in G' to 
a node in Z; moreover, the paths pi do not share any nodes and the only node 
that pi shares with q is Xi. 

Note that every node in G' has either 0, 1, or 2 parents in G'. Let (G' , f) be a quantitative 
Bayesian network such that for each node in X in G' with no parents in G', the cpt f{X) 
is such that dx=o = ^ an d dx=i = d! . If a node X in G' has one parent, say X' , then the 
cpt /(X) is such that dx=i\x'=j is T if z = j and _L if i ^ j. Finally, if X has two parents, 
say X' and X", the cpt f(X) is such that dx=k\ x'=inx"=k is T if A; = i© j(mod2) and is _L 
otherwise. Since d®d' = T and BN2 guarantees that T © _L = T, the construction satisfies 
Rl. The richness of D guarantees that R2 holds. By Theorem |5.6| , there is a (unique) 
plausibility measure in PI G VCd,®,<& that is represented by [G 1 , /). It is easy to check that 
PI is compatible with G as well. There are three cases to consider: 

• Suppose that X has no parents in G' . Then it is easy to see that Ip\(X, Y|Z) for all 
Y and Z (and, in particular, if Y = ND G (X) and Z = Par G (X)). 
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• Suppose that X has one parent in G', say X' . Then it is easy to see that Ip\ (X, Y|Z) 
holds for all Y and Z such that X' £ Z. Since X' is a parent of X in G, again 
7^(X,ND G (X)|Par G (X)) must hold. 

• Finally, if X has two parents in G' , say X' and X", then it is easy to see that 
7™(X,Y|Z) holds for all Y and Z such that {X',X"} C Z. Since X' and X" are 
parents of X in G, again ND G (X )|Par G (X)) must hold. I 



Acknowledgments 

A preliminary version of this paper appears in Uncertainty in Artificial Intelligence, Pro- 
ceedings of the Sixteenth Conference, 2000. I thank Serafm Moral, Fabio Cozman, Peter 
Walley, and the anonymous referees of the UAI version of this paper for very useful com- 
ments. This work was supported in part by the NSF, under grant IRI-96-25901. 

References 

Campos, L. and J. F. Huete (1993). Independence concepts in upper and lower proba- 
bilities. In B. Bouchon-Meunier, L. Valverde, and R. R. Yager (Eds.), Uncertainty in 
Intelligent Systems, pp. 85-96. Amsterdam: North-Holland. 

Campos, L. and J. F. Huete (1999a). Independence concepts in possibility theory: Part 

I. Fuzzy Sets and Systems 103(1), 127-152. 

Campos, L. and J. F. Huete (1999b). Independence concepts in possibility theory: Part 

II. Fuzzy Sets and Systems 103(3), 487-505. 

Campos, L. and S. Moral (1995). Independence concepts for sets of probabilities. In 
Proc. Eleventh Conference on Uncertainty in Artificial Intelligence (UAI '95), pp. 
108-115. 

Cousa, I., S. Moral, and P. Walley (1999). Examples of independence for imprecise prob- 
abilities. In Proc. First Intl. Symp. Imprecise Probabilities and Their Applications, 
pp. 121-130. 

Cozman, F. G. (1998). Irrelevance and independence relations in Quasi-Bayesian net- 
works. In Proc. Fourteenth Conference on Uncertainty in Artificial Intelligence (UAI 
'98), pp. 89-96. 

Cozman, F. G. (2000a). Credal networks. Artificial Intelligence 120(2), 199-233. 

Cozman, F. G. (2000b). Separation properties of setes of probability measures. In 
Proc. Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI 2000). 

Cozman, F. G. and P. Walley (1999). Graphoid properties of epistemic irrelevance and 
independence. Unpublished manuscript. 

Darwiche, A. (1992). A Symbolic Generalization of Probability Theory. Ph. D. thesis, 
Stanford University. 



28 



Darwiche, A. and M. L. Ginsberg (1992). A symbolic generalization of probability theory. 
In Proceedings, Tenth National Conference on Artificial Intelligence (AAAI '92), pp. 
622-627. 

Darwiche, A. and M. Goldszmidt (1994). On the relation between kappa calculus and 
probabilistic reasoning. In Proc. Tenth Conference on Uncertainty in Artificial Intel- 
ligence (UAI '94), pp. 145-153. 

Dubois, D., L. Farinas del Cerro, A. Herzig, and H. Prade (1994). An ordinal view of 
independence with applications to plausible reasoning. In Proc. Tenth Conference on 
Uncertainty in Artificial Intelligence (UAI '94), pp. 195-203. 

Dubois, D. and H. Prade (1990). An introduction to possibilistic and fuzzy logics. In 
G. Shafer and J. Pearl (Eds.), Readings in Uncertain Reasoning, pp. 742-761. San 
Francisco, Calif.: Morgan Kaufmann. 

Finetti, B. d. (1936). Les probabilites nulles. Bulletins des Science Mathematiques 
(premiere partie) 60, 275-288. 

Fonck, P. (1994). Conditional independence in possibility theory. In Proc. Tenth Confer- 
ence on Uncertainty in Artificial Intelligence (UAI '94), pp. 221-226. 

Friedman, N. and J. Y. Halpern (1995). Plausibility measures: a user's guide. In 
Proc. Eleventh Conference on Uncertainty in Artificial Intelligence (UAI '95), pp. 
175-184. 

Geiger, D. and J. Pearl (1988). On the logic of causal models. In Proc. Fourth Workshop 
on Uncertainty in Artificial Intelligence (UAI '88), pp. 136-147. 

Geiger, D., T. Verma, and J. Pearl (1990). Identifying independence in bayesian networks. 
Networks 20, 507-534. 

Gilboa, I. and D. Schmeidler (1993). Updating ambiguous beliefs. Journal of Economic 
Theory 59, 33-49. 

Goldszmidt, M. and J. Pearl (1992). Rank-based systems: A simple approach to belief 
revision, belief update and reasoning about evidence and actions. In Principles of 
Knowledge Representation and Reasoning: Proc. Third International Conference (KR 
'92), pp. 661-672. San Francisco, Calif.: Morgan Kaufmann. 

Halpern, J. Y. (2000). Conditional plausibility measures and Bayesian networks. In 
Proc. Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI 2000), pp. 
247-255. 

Keynes, J. M. (1921). A Treatise on Probability. London: Macmillan. 

Levi, I. (1985). Imprecision and uncertainty in probability judgment. Philosophy of Sci- 
ence 52, 390-406. 

Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems. San Francisco, Calif.: 
Morgan Kaufmann. 

Popper, K. R. (1968). The Logic of Scientific Discovery (revised edition). London: Hutchi- 
son. The first version of this book appeared as Logik der Forschung, 1934. 



29 



Renyi, A. (1964). Sur les espaces simples do probabilites conditionelles. Annales de 
I'Institut Henri Poincare, Nouvelle serie, Section B 1, 3-21. Reprinted as paper 237 in 
Selected Papers of Alfred Renyi, III : 1962-1970, Akademia Kiado, 1976, pp. 284-302. 

Shenoy, P. P. (1994). Conditional independence in valuation based systems. International 
Journal of Approximate Reasoning 10, 203-234. 

Spohn, W. (1988). Ordinal conditional functions: a dynamic theory of epistemic states. 
In W. Harper and B. Skyrms (Eds.), Causation in Decision, Belief Change, and 
Statistics, Volume 2, pp. 105-134. Dordrecht, Netherlands: Reidel. 

Verma, T. (1986). Causal networks: semantics and expressiveness. Technical Report R- 
103, UCLA Cognitive Systems Laboratory. 

Walley, P. (1991). Statistical Reasoning with Imprecise Probabilities, Volume 42 of Mono- 
graphs on Statistics and Applied Probability. London: Chapman and Hall. 

Weydert, E. (1994). General belief measures. In Proc. Tenth Conference on Uncertainty 
in Artificial Intelligence (UAI '94), pp. 575-582. 

Wilson, N. (1994). Generating graphoids from generalized conditional probability. In 
Proc. Tenth Conference on Uncertainty in Artificial Intelligence (UAI '94), PP- 583- 
591. 

Zadeh, L. A. (1978). Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets and 
Systems 1, 3-28. 



30 



